Goto

Collaborating Authors

 machine learning and data mining


Defending against Backdoor Attack on Deep Neural Networks

arXiv.org Artificial Intelligence

Although deep neural networks (DNNs) have achieved a great success in various computer vision tasks, it is recently found that they are vulnerable to adversarial attacks. In this paper, we focus on the so-called \textit{backdoor attack}, which injects a backdoor trigger to a small portion of training data (also known as data poisoning) such that the trained DNN induces misclassification while facing examples with this trigger. To be specific, we carefully study the effect of both real and synthetic backdoor attacks on the internal response of vanilla and backdoored DNNs through the lens of Gard-CAM. Moreover, we show that the backdoor attack induces a significant bias in neuron activation in terms of the $\ell_\infty$ norm of an activation map compared to its $\ell_1$ and $\ell_2$ norm. Spurred by our results, we propose the \textit{$\ell_\infty$-based neuron pruning} to remove the backdoor from the backdoored DNN. Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.


Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions

arXiv.org Artificial Intelligence

Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be used when expert knowledge is available to indicate constraints that can be exploited. Well-known examples of such constraints are must-link (indicating that two instances belong to the same group) and cannot-link (two instances definitely do not belong together). The research area of constrained clustering has grown significantly over the years with a large variety of new algorithms and more advanced types of constraints being proposed. However, no unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks. To remedy this, this study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering. In addition, it focuses on the instance-level pairwise constraints, and gives an overview of its applications and its historical context. Finally, it presents a statistical analysis covering 307 constrained clustering methods, categorizes them according to their features, and provides a ranking score indicating which methods have the most potential based on their popularity and validation quality. Finally, based upon this analysis, potential pitfalls and future research directions are provided.


JMIDS - International Journal on Computer Vision, Machine Learning and Data Mining

#artificialintelligence

Avestia Publishing has initiated the publication of the Journal of Machine Intelligence and Data Science (JMIDS). This journal is based on the continuous model in English and adopts the open- access model.


What is the difference between data mining and machine learning?

#artificialintelligence

I will first explain what is artificial intelligence, machine learning and data mining. Then, I will answer the question. What is artificial intelligence and machine learning? Artificial intelligence is a field of research, which aims at developing software that can do some tasks that require intelligence. What is a task that requires intelligence is open to debate and can be for example to play chess, translate documents, write a novel, or choose the best route to drive from one location to another.


Localized Adversarial Training for Increased Accuracy and Robustness in Image Classification

arXiv.org Machine Learning

Today's state-of-the-art image classifiers fail to correctly classify carefully manipulated adversarial images. In this work, we develop a new, localized adversarial attack that generates adversarial examples by imperceptibly altering the backgrounds of normal images. We first use this attack to highlight the unnecessary sensitivity of neural networks to changes in the background of an image, then use it as part of a new training technique: localized adversarial training. By including locally adversarial images in the training set, we are able to create a classifier that suffers less loss than a non-adversarially trained counterpart model on both natural and adversarial inputs. The evaluation of our localized adversarial training algorithm on MNIST and CIFAR-10 datasets shows decreased accuracy loss on natural images, and increased robustness against adversarial inputs.


Python for Machine Learning and Data Mining

#artificialintelligence

Data Mining and Machine Learching are a hot topics on business intelligence strategy on many companies in the world. These fields give to data scientists the opportunity to explore on a deep way the data, finding new valuable information and constructing intelligence algorithms who can "learn" since the data and make optimal decisions for classification or forecasting tasks. This course is focused on practical approach, so i'll supply you useful snippet codes and i'll teach you how to build professional desktop applications for machine learning and datamining with python language. We'll also manage real data from an example of a real trading company and presenting our results in a professional view with very illustrated graphical charts. We'll initiate at the basic level covering the main topics of Python Language and also the needing programs to develop our applications.


How do you explain Machine Learning and Data Mining to a layman?

#artificialintelligence

Suppose you go shopping for mangoes one day. The vendor has laid out a cart full of mangoes. You can handpick the mangoes, the vendor will weigh them, and you pay according to a fixed Rs per Kg rate (typical story in India). Obviously, you want to pick the sweetest, most ripe mangoes for yourself (since you are paying by weight and not by quality). How do you choose the mangoes?


Difference of Data Science, Machine Learning and Data Mining

@machinelearnbot

The amount of digital data that currently exists is now growing at a rapid pace. The number is doubling every two years and it is completely transforming our basic mode of existence. According to a paper from IBM, about 2.5 billion gigabytes of data had been generated on a daily basis in the year 2012. Another article from Forbes informs us that data is growing at a pace which is faster than ever. The same article suggests that by the year 2020, about 1.7 billion of new information will be developed per second for all the human inhabitants on this planet. As data is growing at a faster pace, new terms associated with processing and handling data are coming up.


Machine Learning and Data Mining for Computer Security: Methods and Applications (Advanced Information and Knowledge Processing): Marcus A. Maloof: 9781846280290: Amazon.com: Books

@machinelearnbot

Intrusion detection and analysis has received a lot of criticism and publicity over the last several years. The Gartner report took a shot saying Intrusion Detection Systems are dead, while others believe Intrusion Detection is just reaching its maturity. The problem that few want to admit is that the current public methods of intrusion detection, while they might be mature, based solely on the fact they have been around for a while, are not extremely sophisticated and do not work very well. While there is no such thing as 100% security, people always expect a technology to accomplish more than it currently does, and this is clearly the case with intrusion detection. It needs to be taken to the next level with more advanced analysis being done by the computer and less by the human. The current area of Intrusion Detection is begging for Machine Learning to be applied to it.


The Real Difference Between Data Science, Machine Learning and Data Mining

@machinelearnbot

Data is all over the place. The measure of digital data that at present exists is currently rising at a quick pace. The number is multiplying at regular intervals and it is totally changing the existence of humanity. According to a paper from IBM, around 2.5 billion gigabytes of data had been created daily in the year 2012. Another article from Forbes advises us that information is developing at a pace which is speedier than at any other time. A similar article recommends that by the year 2020, around 1.7 billion of new data will be produced every second for all the human beings on this planet.